Goto

Collaborating Authors

 mh algorithm


Rapidly Mixing Multiple-try Metropolis Algorithms for Model Selection Problems

Neural Information Processing Systems

The multiple-try Metropolis (MTM) algorithm is an extension of the Metropolis-Hastings (MH) algorithm by selecting the proposed state among multiple trials according to some weight function. Although MTM has gained great popularity owing to its faster empirical convergence and mixing than the standard MH algorithm, its theoretical mixing property is rarely studied in the literature due to its complex proposal scheme.


Score-Based Metropolis-Hastings Algorithms

Aloui, Ahmed, Hasan, Ali, Dong, Juncheng, Wu, Zihao, Tarokh, Vahid

arXiv.org Artificial Intelligence

In this paper, we introduce a new approach for integrating score-based models with the Metropolis-Hastings algorithm. While traditional score-based diffusion models excel in accurately learning the score function from data points, they lack an energy function, making the Metropolis-Hastings adjustment step inaccessible. Consequently, the unadjusted Langevin algorithm is often used for sampling using estimated score functions. The lack of an energy function then prevents the application of the Metropolis-adjusted Langevin algorithm and other Metropolis-Hastings methods, limiting the wealth of other algorithms developed that use acceptance functions. We address this limitation by introducing a new loss function based on the \emph{detailed balance condition}, allowing the estimation of the Metropolis-Hastings acceptance probabilities given a learned score function. We demonstrate the effectiveness of the proposed method for various scenarios, including sampling from heavy-tail distributions.


Dimension-free Relaxation Times of Informed MCMC Samplers on Discrete Spaces

Chang, Hyunwoong, Zhou, Quan

arXiv.org Machine Learning

Convergence analysis of Markov chain Monte Carlo methods in high-dimensional statistical applications is increasingly recognized. In this paper, we develop general mixing time bounds for Metropolis-Hastings algorithms on discrete spaces by building upon and refining some recent theoretical advancements in Bayesian model selection problems. We establish sufficient conditions for a class of informed Metropolis-Hastings algorithms to attain relaxation times that are independent of the problem dimension. These conditions are grounded in high-dimensional statistical theory and allow for possibly multimodal posterior distributions. We obtain our results through two independent techniques: the multicommodity flow method and single-element drift condition analysis; we find that the latter yields a tighter mixing time bound. Our results and proof techniques are readily applicable to a broad spectrum of statistical problems with discrete parameter spaces.


DQSSA: A Quantum-Inspired Solution for Maximizing Influence in Online Social Networks (Student Abstract)

Rao, Aryaman, Singh, Parth, Vishwakarma, Dinesh Kumar, Prasad, Mukesh

arXiv.org Artificial Intelligence

Influence Maximization is the task of selecting optimal nodes maximising the influence spread in social networks. This study proposes a Discretized Quantum-based Salp Swarm Algorithm (DQSSA) for optimizing influence diffusion in social networks. By discretizing meta-heuristic algorithms and infusing them with quantum-inspired enhancements, we address issues like premature convergence and low efficacy. The proposed method, guided by quantum principles, offers a promising solution for Influence Maximisation. Experiments on four real-world datasets reveal DQSSA's superior performance as compared to established cutting-edge algorithms.


Statistical guarantees for stochastic Metropolis-Hastings

Bieringer, Sebastian, Kasieczka, Gregor, Steffen, Maximilian F., Trabs, Mathias

arXiv.org Machine Learning

A Metropolis-Hastings step is widely used for gradient-based Markov chain Monte Carlo methods in uncertainty quantification. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but reduces the effective sample size. We show that this obstacle can be avoided by a simple correction term. We study statistical properties of the resulting stationary distribution of the chain if the corrected stochastic Metropolis-Hastings approach is applied to sample from a Gibbs posterior distribution in a nonparametric regression setting. Focusing on deep neural network regression, we prove a PAC-Bayes oracle inequality which yields optimal contraction rates and we analyze the diameter and show high coverage probability of the resulting credible sets. With a numerical example in a high-dimensional parameter space, we illustrate that credible sets and contraction rates of the stochastic Metropolis-Hastings algorithm indeed behave similar to those obtained from the classical Metropolis-adjusted Langevin algorithm.


Metropolis-Hastings algorithm in joint-attention naming game: Experimental semiotics study

Okumura, Ryota, Taniguchi, Tadahiro, Hagiwara, Yosinobu, Taniguchi, Akira

arXiv.org Artificial Intelligence

In this study, we explore the emergence of symbols during interactions between individuals through an experimental semiotic study. Previous studies investigate how humans organize symbol systems through communication using artificially designed subjective experiments. In this study, we have focused on a joint attention-naming game (JA-NG) in which participants independently categorize objects and assign names while assuming their joint attention. In the theory of the Metropolis-Hastings naming game (MHNG), listeners accept provided names according to the acceptance probability computed using the Metropolis-Hastings (MH) algorithm. The theory of MHNG suggests that symbols emerge as an approximate decentralized Bayesian inference of signs, which is represented as a shared prior variable if the conditions of MHNG are satisfied. This study examines whether human participants exhibit behavior consistent with MHNG theory when playing JA-NG. By comparing human acceptance decisions of a partner's naming with acceptance probabilities computed in the MHNG, we tested whether human behavior is consistent with the MHNG theory. The main contributions of this study are twofold. First, we reject the null hypothesis that humans make acceptance judgments with a constant probability, regardless of the acceptance probability calculated by the MH algorithm. This result suggests that people followed the acceptance probability computed by the MH algorithm to some extent. Second, the MH-based model predicted human acceptance/rejection behavior more accurately than the other four models: Constant, Numerator, Subtraction, and Binary. This result indicates that symbol emergence in JA-NG can be explained using MHNG and is considered an approximate decentralized Bayesian inference.


Importance is Important: A Guide to Informed Importance Tempering Methods

Li, Guanxun, Smith, Aaron, Zhou, Quan

arXiv.org Machine Learning

Informed importance tempering (IIT) is an easy-to-implement MCMC algorithm that can be seen as an extension of the familiar Metropolis-Hastings algorithm with the special feature that informed proposals are always accepted, and which was shown in Zhou and Smith (2022) to converge much more quickly in some common circumstances. This work develops a new, comprehensive guide to the use of IIT in many situations. First, we propose two IIT schemes that run faster than existing informed MCMC methods on discrete spaces by not requiring the posterior evaluation of all neighboring states. Second, we integrate IIT with other MCMC techniques, including simulated tempering, pseudo-marginal and multiple-try methods (on general state spaces), which have been conventionally implemented as Metropolis-Hastings schemes and can suffer from low acceptance rates. The use of IIT allows us to always accept proposals and brings about new opportunities for optimizing the sampler which are not possible under the Metropolis-Hastings framework. Numerical examples illustrating our findings are provided for each proposed algorithm, and a general theory on the complexity of IIT methods is developed.


Provable Convergence of Variational Monte Carlo Methods

Li, Tianyou, Chen, Fan, Chen, Huajie, Wen, Zaiwen

arXiv.org Machine Learning

The Variational Monte Carlo (VMC) is a promising approach for computing the ground state energy of many-body quantum problems and attracts more and more interests due to the development of machine learning. The recent paradigms in VMC construct neural networks as trial wave functions, sample quantum configurations using Markov chain Monte Carlo (MCMC) and train neural networks with stochastic gradient descent (SGD) method. However, the theoretical convergence of VMC is still unknown when SGD interacts with MCMC sampling given a well-designed trial wave function. Since MCMC reduces the difficulty of estimating gradients, it has inevitable bias in practice. Moreover, the local energy may be unbounded, which makes it harder to analyze the error of MCMC sampling. Therefore, we assume that the local energy is sub-exponential and use the Bernstein inequality for non-stationary Markov chains to derive error bounds of the MCMC estimator. Consequently, VMC is proven to have a first order convergence rate $O(\log K/\sqrt{n K})$ with $K$ iterations and a sample size $n$. It partially explains how MCMC influences the behavior of SGD. Furthermore, we verify the so-called correlated negative curvature condition and relate it to the zero-variance phenomena in solving eigenvalue functions. It is shown that VMC escapes from saddle points and reaches $(\epsilon,\epsilon^{1/4})$ -approximate second order stationary points or $\epsilon^{1/2}$-variance points in at least $O(\epsilon^{-11/2}\log^{2}(1/\epsilon) )$ steps with high probability. Our analysis enriches the understanding of how VMC converges efficiently and can be applied to general variational methods in physics and statistics.


Deep Learning Aided Laplace Based Bayesian Inference for Epidemiological Systems

Kwok, Wai Meng, Dass, Sarat Chandra, Streftaris, George

arXiv.org Machine Learning

Parameter estimation and associated uncertainty quantification is an important problem in dynamical systems characterized by ordinary differential equation (ODE) models that are often nonlinear. Typically, such models have analytically intractable trajectories which result in likelihoods and posterior distributions that are similarly intractable. Bayesian inference for ODE systems via simulation methods require numerical approximations to produce inference with high accuracy at a cost of heavy computational power and slow convergence. At the same time, Artificial Neural Networks (ANN) offer tractability that can be utilized to construct an approximate but tractable likelihood and posterior distribution. In this paper we propose a hybrid approach, where Laplace-based Bayesian inference is combined with an ANN architecture for obtaining approximations to the ODE trajectories as a function of the unknown initial values and system parameters. Suitable choices of a collocation grid and customized loss functions are proposed to fine tune the ODE trajectories and Laplace approximation. The effectiveness of our proposed methods is demonstrated using an epidemiological system with non-analytical solutions, the Susceptible-Infectious-Removed (SIR) model for infectious diseases, based on simulated and real-life influenza datasets. The novelty and attractiveness of our proposed approach include (i) a new development of Bayesian inference using ANN architectures for ODE based dynamical systems, and (ii) a computationally fast posterior inference by avoiding convergence issues of benchmark Markov Chain Monte Carlo methods. These two features establish the developed approach as an accurate alternative to traditional Bayesian computational methods, with improved computational cost.


A survey of Monte Carlo methods for noisy and costly densities with application to reinforcement learning

Llorente, F., Martino, L., Read, J., Delgado, D.

arXiv.org Machine Learning

This survey gives an overview of Monte Carlo methodologies using surrogate models, for dealing with densities which are intractable, costly, and/or noisy. This type of problem can be found in numerous real-world scenarios, including stochastic optimization and reinforcement learning, where each evaluation of a density function may incur some computationally-expensive or even physical (real-world activity) cost, likely to give different results each time. The surrogate model does not incur this cost, but there are important trade-offs and considerations involved in the choice and design of such methodologies. We classify the different methodologies into three main classes and describe specific instances of algorithms under a unified notation. A modular scheme which encompasses the considered methods is also presented. A range of application scenarios is discussed, with special attention to the likelihood-free setting and reinforcement learning. Several numerical comparisons are also provided.